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Abstract 

We show how Benford's Law (BL) for first, second, ... , digits, emerges from the distribution 
of digits of numbers of the type a^, with a any real positive number and R a set of real numbers 
uniformly distributed in an interval [Plog^j 10, (P + 1) log^j 10) for any integer P. The result is 
shown to be number base and scale invariant. A rule based on the mantissas of the logarithms 
allows for a determination of whether a set of numbers obeys BL or not. We show that BL applies 
to numbers obtained from the multiplication or division of numbers drawn from any distribution. 
We also argue that (most of) the real-life sets that obey BL are because they are obtained from 
such basic arithmetic operations. Wc exhibit that all these arguments were discussed in the original 
paper by Simon Newcomb in 1881, where he presented Benford's Law. 
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I. INTRODUCTION. 



Benford's Law (BL) asserts that in certain sets of numbers, most of them of real-hfe 
origin, the first digit is distributed non-uniformly in the form 

ps\d) =\og,,(i+^-y (1) 

where d is the first digit of the number and log^g is the logarithm base 10. In other words, 
Pl^\d) is the fraction of the numbers with first digit d in the given set. There are also forms 
of Benford's Law for second, third, etc., digits, namely Pj^\d). Table 1 shows the values of 
P^B\d) for c/= 1,2,...,9. 



d 
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4 


5 


6 


7 


8 


9 


p(i) 


0.3010 


0.1761 


0.1249 


0.0969 


0.0792 


0.0669 


0.0580 


0.0512 


0.0458 



TABLE L First digit Benford law. 



BL has been found to be obeyed quite well in a variety of situations, many of them 
checked by Franck Benford himself [Ij. These sets include population census, stock markets 
indeces, utilities bills, tax returns, areas of rivers, physical and mathematical constants, 
and molecular weights, among others[ll |^. At first sight, the Law is certainly baffling and 
counterintuitive [3] since one's naive intuition is that digits of numbers should be uniformly or 
randomly distributed. Although Franck Benford has been credited with the law for his work 
of 1938 [1], the law was originally discovered by the astronomer Simon Newcomb in 1881 [1] as 
a follow up of the observation that the pages of tables of logarithms in his university library 
were worn out following BL, as given by equation ([T]). What is rarely told is that Newcomb 
derived Benford's Law. His demonstration for us may now look obscure, and probably just 
sketchy, because he used arguments that were not so difflcult to those familiar with concepts 
of log tables ... and certainly we are not. We shall advance a plausible explanation of 
Newcomb's observation of the worn pages of the log tables and argue why many sets of 
real-life origin also obey BL; alas, this argument was also used by Newcomb. 

We shall first prove a general result that appears to be known already[2l El El [71 [8], 
although to the best of our knowledge it has not been shown explicitely in the form here 
presented; we shall see that yields exactly all digit's Benford's distributions, allowing also 
for concluding that it is scale and number base invariant [5j . We demonstrate that if R is 
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a set of real numbers uniformly distributed, then, the distributions of digits of a obey BL 
for any real positive number a. Then, we discuss the main result of Newcomb, namely, the 
fact that a given set of numbers obeys BL if the mantissas of their logarithms are uniformly 
distributed. We then analyze two main type of sequences of numbers that obey BL, those 
that are obtained from multiplication of numbers drawn from any distribution and those 
that are part of a geometric progression of numbers uniformly distributed in an arbitrary 
interval. 

II. A GENERAL RESULT CONCERNING BENFORD'S LAW. 

Let {Ri, R2, ■ ■ ■ , Rn} be a sequence of real numbers drawn from a uniform distribution 
in the interval Ri G [Plog„ 10, (P + 1) log^ 10) , with P any integer. Then, the first, second, 

digit distributions of the sequence {a^S '^^^5 • • • ? cl^^}, with a any real positive number, 
approaches Benford's Law, Eq.Q and its generalizations, as — 00 . 

Let us look first at the first digit distribution. In Fig. [l]we plot vs i? in a semi-log 
(base a) scale. In this graph, vs R appears as a straight line. Now take in the i?-axis 
the sequence {Ri, R2, ■ ■ ■ , Rn} within the interval [P log„ 10, (P + 1) log„ 10) . Take any 
number of the sequence, say P,. Then, in the logarithmic scale, log^a*^' must lie within any 
of the following "bins": 61, the interval [log^ (^1 x 10^^ jloga ^ 10"^)); or 62? the interval 
[log, (2 X 10^) , log, (3 X 10^)); . . .; or, 69, the interval [log, (9 x 10^) , log, (lO x 10^)). 
The main point is this: if log, a^* lies within the bin bd, then the first digit of a^' is d. 




FIG. 1: a vs i? in semi- log scale. The dotted line shows an example of an arbitrary point Ri in 
the chosen interval, such that the first digit of a^' is 2 because it falls in bin 62- 
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Since Ri was drawn from a uniform distribution, it has the same chance to take any value 
within the interval [Plog^ 10, (P + 1) log„ 10) and, therefore, the probability of log^a^' to 
fall within the bin hd is the length of the bin hd divided by the length of the full interval, 
namely 

log,((rf + l) X 10^) - log, (dx 10^' 



P 



log, a^^ e h. 



log, (10^+1)- log, (10^ 

\ d ) 



log/^^ 



log„10 
logio 



1 



(2) 



This has the form of Benford's Law for the first digit Pj^\d), Eq. (l|. Thus, as ^ oo, 
the first digit distribution of the sequence {a^^, a'^^, . . . , a^^} will approach P^\d). We 



shall call this the General Result (GR). Note that GR is independent of the integer value of 
P of the interval as long as the sequence is uniformly distributed. Clearly, the result holds 
if we change the interval to [Plog, 10, (P + M) log, 10) with M any integer. Note that we 
never used the fact that neither a, nor R, nor d, are numbers base 10; the graph in Fig. [T] 
is plotted for numbers base 10 for illustration purposes, but the result would have been the 
same for any number base. Thus, we conclude that BL is base invariant, i.e. valid for any 
number base K, with d = 0,1,2, . . . , K — 1. 

The second, third, digit distributions follow right away with the same argument. For 
instance, the probability that the second digit of a^' is d, equals the sum of the lenghts of 
the "sub-bins", 



bid 
hd 



'1 



10' 



X 10' 



.log. ((l + ^)xlO''' 



log. ((2+^)xl0^),log, [(2 + ^)xl0^' 



log. ((9 + ^) X 10^] , log, ({9 + ^) X 10^' 



where d can now take all values 0, 1, . . . , 9. Thus, the second digit distribution is. 



log, (10^+1) - log, (10^) 
1 

log, (10^+1) - log, (10^) 
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The argument is easily generalized to the n-th digit and the result is, 

m=10" ^ 

The present derivation is extremely simple. Although Newcomb never wrote the formulas 
forPjr^(rf), given his mastery of log tables and numerical analysis[9j, it is clear that he knew 
them since he writes the values of P^\d) and Pj^\d) explicitely and mentions how Pj^\d) 
and Pj^\d) behave (the latter are almost uniform). Due to Newcomb's most important 
result, as we discuss in the next section, it seems to the author that he knew a derivation 
very similar to this one. Benford's Law and its generalizations have been rigorously shown 
by Hill[H] to follow as a consequence of base invariance of the underlying law. We have no 
pretense of such a mathematical rigour here, but rather to show its simplicity to a wider 
audience. 

A. Scale invariance. 

A very important property of BL that follows from GR above is the fact that BL is scale 
invariant [6] . Add to the values Ri any constant value c. This is equivalent to consider a 
uniform sequence of numbers Ri in the interval [c + P log„ 10, c + (P + 1) log^ 10) . Referring 
to Fig. [T] one can see that in the semi-log graph this also amounts to shift the interval in 
the ordinate by a constant amount, a'^; one also sees, however, that the sizes of the bins 6„ 
remain unchanged. Thus, the sequence {a^a^^,a'^a^^, . . . ,a'^a^'^} also obeys BL. But this 
new sequence is the same as the original one {a^^ ,a^^, . . . , a^^} multiplied by a constant, 
arbitrary, factor a"^. 

B. The mantissa rule. 

The sequences or sets of numbers that usually follow BL are not of the form {a^}. So, 
one can enquiry for a rule that tells us if some sequences do follow BL or not. The answer 
was also given by Newcomb in his two-page paper We shall demonstrate now that for 
a given sequence of numbers {Ai, A2, ■ ■ . , Ajy}, if the mantissas of the logarithms of Ai, 
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namely of {log^Q 74i, log^^o ^2, ■ ■ ■ , log^o ^at}, are uniformly distributed, then the sequence 
{Ai, A2, . . . 1 An} obeys BL. Before we give the demonstration, we note that GR can be 
restated much simpler for the case a = 10, namely, if a sequence of numbers Ri are uniformly 

distributed in the interval [0, 1), the sequence 10^^ follows BL. We use this form below. 
The demonstration can be done writing the log of Ai as. 



where C{Ai) is an integer, the so-called characteristic of the log, and m{Ai) the fractional 
part of the logarithm or mantissa. Note that by definition the mantissas of logarithms base 
10 are within the interval [0, 1). It is clear, then, that when taking the "antilogarithm" 
lO^ogio^i = l0'^(^i)lQ"^(^i) the digits of A^ will be determined only by 10"^^^'^ since the 
factor 10*-^^^') just determines the position of the decimal point. Thus, the distribution 
of digits is determined by considering the sequence of the mantissas only, namely of the 
sequence {m{Ai) , m{A2) , . . . , m{A]\i)}. Hence, if the latter are uniformly distributed, by GR 
the sequence {10"^^^^\ io™(^2)^ ^ iQ^iAN)^ obeys BL and, therefore, so does the sequence 
{Ai,A2,...,An}. 

This result is very useful since allows us to check if a sequence of numbers obeys BL by 
looking at the distribution of the mantissas of their logarithms. This is a simple operational 
rule, instead of a logical one by checking at the digits themselves. 

III. SOME SEQUENCES THAT OBEY BL. 

The next question is which type of sequences or sets of numbers follow BL. Answering 
this in an exhaustive fashion appears as a difficult task. Here, we discuss two general type of 
sequences that can be shown quite clearly that obey BL. With these two, we shall conjecture 
about the general case. We discuss these cases below. 

A. Products of variables with arbitrary distributions. 

Consider the set of numbers {Qi,Q2, ■ ■ ■ ,Qn}, with given by the product of M num- 
bers. 



logioA = C(Ai)+m(A), 



(5) 



j — iti 1X2 ■ ■ ■ it 




(6) 
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where R are the absolute values of numbers drawn from an arbitrary distribution (up to a 
requirement to be given below). We now show that in the limit N oo and M —>■ oo, the 
sequence {Qi, Q2, ■ ■ ■ ,Qn} obeys BL. 

The idea is to use the mantissa rule. For this, we consider the log base 10 of the numbers 

Qij 

logio Q^ = loglO R? + loglO R^^ + --- + logio Rm ■ (7) 

We now introduce the requirement that the distribution of the logarithms of the numbers 
R have finite first and second moments. Then, in the limit M oo, by the Central Limit 
Theorem (CLT)[13j, the distribution of logj^o*? is the normal distribution. That is, the 
values of log^^Q Q are distributed as, 

p(log,o Q) = -L- e'(i°s« «-i°sio Qo)V2.^ _ (g>) 



Note that this is not the log-normal distribution, but simply the normal distribution for the 
variable log;^Q Q. The centroid log^^p Qo ~ Mco and a ~ -\/Mcro, where Cq and ctq are the first 
and second moments of the distribution of the logarithms of the numbers R. This point will 
be further discussed below. 

We proceed to show that the mantissas of the sequence {log^^g Qi, log^Q Q2, ■ ■ ■ , logi^QN} 
are uniformly distributed in the interval [0, 1), in the limits mentioned. Before we give the 
general condition, we can see the how this limit is achieved. Assume that the gaussian func- 
tion given by Eq.(j8]) is already wide enough such that it covers several orders of magnitude, 
or "decades", of the values of Q; see Fig. [2] where the decades are denoted by P — 2, P — 1, 
. . ., P + 3 . The mantissas of the log^g Q are the decimal values within the intervals P — L 
and P — (L + 1). Thus, we can "shift" all intervals within all decades to a single interval, 
thus placing the mantissas within the same interval. Adding all the values of the mantissas 
yields, almost, a uniform distribution. This procedure is the same as considering the sum of 
an infinite number of gaussians each centered at (log^^g Qq — P) with P taking all the integer 

values; in the limit M —>■ 00, equivalent to cr — > 00, one gets the exact result, 

-i 00 

lim g-(iogioQ-iogioQo+P)V2^' = 1 (9) 



P=-oo 

This proves that the mantissas are uniformly distributed in the limit, for logarithms normally 
distributed. Although the previous result is strictly valid only in the limit a — 00, the 
convergence is extremely fast. For instance, for a ~ 1, the sum differs from 1 in the eighth 




FIG. 2: First panel, normal distribution of log^o(^)) covering 5 decades approximately. Second 
panel, the mantissas of the normal distribution within one decade, i.e. in the interval [0,1); the 
dotted line is the sum of only 5 decades, adding to 1 within 3 significant figures. 

significant figure. One finds strong deviations from the uniform distribution as a becomes 
much smaller than 1, that is, when the gaussian covers less than one decade. 

On the other hand, since a depends not only on M but also on the second moment ctq of 
the distribution of logarithms of R, i.e. a ^ y/Ma^, the convergence might be very slow if 
the width, or support, of the distribution of R itself is very narrow. As particular examples, 
considering R taken from a uniform distribution in the interval [1,10), requires M to be 
less than 10 (about 4 or 5) to converge to BL. Conversely, for R in the interval [5, 6) takes 
M ^ 400 to yield BL. 

The result of this section, namely of the product of numbers obeying BL, is very robust 
and general |6] in the sense that even if the distribution of the numbers R lack second moment, 
the logarithm of R may notflOJ. This is because the logarithm function is very "slow" and 
tends to smooth the original distribution. Moreover, even if the numbers R are correlated, 
the action of the logarithm and the limit of very large products (i.e. large values of M) may 
again yield a normal distribution of the logarithms [HI [12] . 

B. Generalized geometric sequences of variables uniformly distributed. 

Here we consider a geometric sequence of products of the form, 

{Z^'\ . . . , Z(^), . . .} = {R^^\ Rf^R^?, Rf^RfRf\ . . . , ^''^i^f ^ ■ ■ ■ i?!v^\ • • •}, (10) 
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FIG. 3: Numerical analysis of the first 10,000 terms of one realization of a generalized geometric 



sequence, as given by Eq.(lO), for numbers uniformly distributed in the interval [1.0,9.9]. First 
panel shows the uniform distribution of log^g(A) covering more than 40 decades. Second panel 
shows that the distribution of mantissas is uniform in [0,1). The third panel is a comparison of the 
exact Benford Law (circles) with the distribution of the first digit of the 10,000 terms considered 
(triangles). 

where E!f^ are numbers uniformly distributed in an arbitrary interval [a, h\ , with a and h 
real positive numbers. This sequence obeys BL. Although this result may be generalized to 
arbitrary distributions, we restrict the results here to uniform distributions. We note that 
if a = 6, the above sequence is a true geometric progression with ratio a. Thus, geometric 
progressions also obey BL (except if a = 10"^ with L any integer). 

Again, we first consider the sequence of the logarithms of the products, log^^g Since 
we do not have an analytic demonstration, we resort to a numerical one. In Fig. |3] a 
particular example shows that the distribution of log^Q Z'^"'^ becomes uniformly distributed 
as J ^ oo. As this distribution covers many decades of Z'^'^\ obviously the mantissas of 
logj^o -^''"^^ ^Iso become uniform in [0, 1). A numerical comparison with BL is also included. 
We have extensively verified that these results hold for any sequence of this type, including 
true geometric progressions [7]. 



C. A conjecture on the general case. 

From the above two cases, it appears that a generalization is as follows: As long as the 
distribution of logarithms is wide enough, namely, covering many decades of the set con- 
sidered, the mantissa distribution will tend to become uniformly distributed. An analogous 
argument was recently used by Fewster[2j to illustrate when Benford's law should be obeyed. 
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IV. WHY THE PAGES OF LOG TABLES WEAR OUT FOLLOWING BL? 



Simon Newcomb initiates his article by pointing out that the log tables were worn out 
more at the beginning than at the end, i.e. following BL. That is, since the tables are for 
logarithms of numbers going in order from 1.000 ... to 9.999 . . ., he found that the pages 
for numbers starting with 1 were more used than those for numbers starting with 2, etc. 
Newcomb gave an explanation of these observation by assuming that "natural" numbers, i.e. 
those appearing in Nature, were obtained by ratios of other numbers. Then, he argued that 
no matter the underlying law of the primitive numbers, their ratios (in the limit of many 
ratios) had the mantissas of their logarithms uniformly distributed. He then simply stated 
that this implied Benford Law. As we have seen, the mantissa rule is equivalent to GR. It is 
fairly evident that Newcomb certainly knew this result, and thus, that he must be credited 
with the derivation of BL. We mention, once more, that the arguments given in this article 
are essentially contained in Newcomb's original paper. 

An interesting aspect is why Newcomb considered that "natural" numbers were the result 
of ratios, or products for that matter, of other numbers. In the light of the previous sections 
and a bit of second-guessing, we can advance an explanation for this assertion by Newcomb. 
Moreover, this may also well be the explanation for the agreement of actual real-life data 
with BL. 

To begin, we should recall why log tables were used in the first place. We are well into 
the era of electronic calculators, be it a pocket-size one or a huge supercomputer: numerical 
calculations are now their task not ours. But as recently as the early 1970's, not to mention 
in the XIX century, numerical calculations were done by hand and/or sliding rules. And 
the log tables were essential to realize those tedious and lengthy tasks. As a matter of 
fact, logarithms were invented (or discovered?) by John Napier in 1614 to perform lengthy 
calculations! In the words of Napier himself [H] , 

Seeing there is nothing that is so troublesome to mathematical practice, nor that doth 
more molest and hinder calculations, than the multiplications, divisions, square and cubic 
extractions of great numbers. ... I began therefore to consider in my mind by what certain 
and ready art I might remove those hindrances. - John Napier, Mirifici logarithmorum 
canonis descriptio (1614). 

That is, the trouble appears when one must make calculations by hand, specially multipli- 
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cations, of numbers with many digits. It is lengthy, tedious and prone to produce mistakes. 
Thus, one goes to the tables to find out the logarithms of the numbers involved, performs 
sums and subtractions which are much easier, and then taking antilogarithms the result is 
found. The point is, where did those long numbers come from? Those were definitely not 
made up, neither read out from somewhere else, nor measured. The long numbers came 
themselves from multiplication, divisions or powers of smaller numbers. The latter may be 
random, or measured or taken arbitrarily from somewhere else, indeed. But, we insist, the 
long numbers did arise from operations performed on smaller numbers. As we have seen in 
the previous section, multiplication of numbers tipically tend to BL, even if only few factors 
arc involved, as long as they arise from wide distributions. In other words, the numbers 
that people looked their logarithms for, typically, obeyed already BL. Since in the XIX cen- 
tury numbers were not churned out from a computer but arised from arithmetic operations 
performed by real people, it seems to the author that for Newcomb these were "naturally" 
produced. This may also explain why many sets of real-hfe data obey BL, that is, unless 
one asks a computer for a random number, numbers that quantify a property, be it the area 
of a lake or the weight of a molecule, usually arise from arithmetic operations performed on 
measured quantities with arbitrary constants and units. 
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